Describing Data Graphically with R

Mason Garrison

January 26, 2022

Hans Rosling

Summarize

Exploratory Data Analysis

Descriptive Statistics

#Examples - Tables - Woodbridge (1845)

Examples

Examples

Categorical Variable Displays (Nominal, Ordinal)

Frequency distribution graph

Nicole Oresme (Bishop of Lisieus) circa 1350

Bar Graph/Chart

# Bar chart
library(car)
counts <- table(mtcars$gear)
barplot(counts, main="Car Distribution",
    xlab="Number of Gears")

Stacked Bar Chart

df <- data.frame(
  group = c("Male", "Female", "Child"),
  value = c(25, 25, 50)
  )
head(df)
##    group value
## 1   Male    25
## 2 Female    25
## 3  Child    50
library(ggplot2)
# Barplot
bp<- ggplot(df, aes(x="", y=value, fill=group))+
geom_bar(width = 1, stat = "identity")
bp

Pie Chart

A pie chart showing each state in the United States, part of Playfair’s translation of A Statistical Account of the United States of America by D. F. Donnant.

R Examples

# Pie chart
# Example 1
slices <- c(10, 12,4, 16, 8)
lbls <- c("US", "UK", "Australia", "Germany", "France")
pie(slices, labels = lbls, main="Pie Chart of Countries")

# Example 2
mytable <- table(iris$Species)
lbls <- paste(names(mytable), "\n", mytable, sep="")
pie(mytable, labels = lbls,
        main="Pie Chart of Species\n (with sample sizes)")

Convert Bar Chart into Pie Chart

pie <- bp + coord_polar("y", start=0)

pie

pie + scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))

Additional Resources

Quantitative Variables

Histogram

# Histogram
library(MASS)
variable<-cats$Bwt
hist(variable)

variable<-variable*2.2   #Convert to Imperial
hist(variable)

Stemplot

# Stem and Leaf plot
stem(faithful$eruptions,scale=1)
## 
##   The decimal point is 1 digit(s) to the left of the |
## 
##   16 | 070355555588
##   18 | 000022233333335577777777888822335777888
##   20 | 00002223378800035778
##   22 | 0002335578023578
##   24 | 00228
##   26 | 23
##   28 | 080
##   30 | 7
##   32 | 2337
##   34 | 250077
##   36 | 0000823577
##   38 | 2333335582225577
##   40 | 0000003357788888002233555577778
##   42 | 03335555778800233333555577778
##   44 | 02222335557780000000023333357778888
##   46 | 0000233357700000023578
##   48 | 00000022335800333
##   50 | 0370

Time Plots

Time Plots

Information Aesthetic
Size of Napoleon’s Grande Armée Width of path
Longitude of the army’s position x-axis
Latitude of the army’s position y-axis
Direction of the army’s movement Color of path
Date of points along retreat path Text below plot
Temperature during the army’s retreat Line below plot

Recreation in R

library(tidyverse)
library(lubridate)
library(ggplot2)
library(ggmap)
library(ggrepel)
library(gridExtra)
library(psych)
#Download Directly
download=FALSE # set to true to download
if(download){
 cities <- read.table("https://raw.githubusercontent.com/andrewheiss/fancy-minard/master/input/minard/cities.txt",
                      header = TRUE, stringsAsFactors = FALSE)
 
 troops <- read.table("https://raw.githubusercontent.com/andrewheiss/fancy-minard/master/input/minard/troops.txt",
                      header = TRUE, stringsAsFactors = FALSE)
 temps <- read.table("https://raw.githubusercontent.com/andrewheiss/fancy-minard/master/input/minard/temps.txt",
                     header = TRUE, stringsAsFactors = FALSE)
}else{


cities <- read.table("../dat/cities.txt",
                     header = TRUE, stringsAsFactors = FALSE)
troops <- read.table("../dat/troops.txt",
                     header = TRUE, stringsAsFactors = FALSE)
temps <- read.table("../dat/temps.txt",
                    header = TRUE, stringsAsFactors = FALSE)
}

describe(cities)
##       vars  n  mean   sd median trimmed  mad  min  max range  skew kurtosis   se
## long     1 20 30.79 4.08  30.30   30.77 4.74 24.0 37.6  13.6  0.17    -1.31 0.91
## lat      2 20 54.87 0.55  54.95   54.89 0.74 53.9 55.8   1.9 -0.20    -1.18 0.12
## city*    3 20 10.50 5.92  10.50   10.50 7.41  1.0 20.0  19.0  0.00    -1.38 1.32
describe(troops)
##            vars  n     mean        sd  median  trimmed      mad    min      max    range  skew kurtosis       se
## long          1 51    28.97      4.39    28.3    28.56     5.49   24.0     37.7     13.7  0.64    -0.88     0.61
## lat           2 51    54.93      0.51    54.9    54.91     0.74   54.1     55.8      1.7  0.13    -1.29     0.07
## survivors     3 51 91217.65 101718.61 40000.0 72880.49 50408.40 4000.0 340000.0 336000.0  1.31     0.48 14243.45
## direction*    4 51     1.51      0.50     2.0     1.51     0.00    1.0      2.0      1.0 -0.04    -2.04     0.07
## group         5 51     1.43      0.70     1.0     1.29     0.00    1.0      3.0      2.0  1.27     0.15     0.10
describe(temps)
##        vars n   mean    sd median trimmed   mad   min  max range skew kurtosis   se
## long      1 9  30.63  4.30   29.2   30.63  4.15  25.3 37.6  12.3 0.34    -1.56 1.43
## temp      2 9 -15.67 11.10  -20.0  -15.67 13.34 -30.0  0.0  30.0 0.26    -1.66 3.70
## month*    3 9   1.89  0.78    2.0    1.89  1.48   1.0  3.0   2.0 0.15    -1.54 0.26
## day       4 9  14.56  9.46   14.0   14.56 11.86   1.0 28.0  27.0 0.06    -1.72 3.15
## date*     5 9   5.00  2.74    5.0    5.00  2.97   1.0  9.0   8.0 0.00    -1.60 0.91
temps$date <- as.Date(strptime(temps$date,"%d%b%Y"))
temps.nice <- temps %>%
  mutate(nice.label = paste0(temp, "°, ", month, ". ", day))
march.1812.plot.simple <- ggplot() +
  geom_path(data = troops, aes(x = long, y = lat, group = group, 
                               color = direction, size = survivors),
            lineend = "round") +
  geom_point(data = cities, aes(x = long, y = lat),
             color = "#DC5B44") +
  geom_text_repel(data = cities, aes(x = long, y = lat, label = city),
                  color = "#DC5B44") +
  scale_size(range = c(0.5, 10)) + 
  scale_colour_manual(values = c("#DFC17E", "#252523")) +
  guides(color = FALSE, size = FALSE) +
  theme_nothing()

march.1812.plot.simple

# Change the x-axis limits to match the simple map
temps.1812.plot <- ggplot(data = temps.nice, aes(x = long, y = temp)) +
  geom_line() +
  geom_label(aes(label = nice.label),
             size = 2.5) + 
  labs(x = NULL, y = "° Celsius") +
  scale_x_continuous(limits = ggplot_build(march.1812.plot.simple)$layout$panel_ranges[[1]]$x.range) +
  scale_y_continuous(position = "right") +
  coord_cartesian(ylim = c(-35, 5)) +  # Add some space above/below
  theme_bw() +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        panel.grid.minor.y = element_blank(),
        axis.text.x = element_blank(), axis.ticks = element_blank(),
        panel.border = element_blank())

temps.1812.plot

# Combine the two plots
both.1812.plot.simple <- gtable_rbind(ggplotGrob(march.1812.plot.simple),
                               ggplotGrob(temps.1812.plot))

both.1812.plot.simple
## TableGrob (24 x 9) "layout": 36 grobs
##     z         cells       name                                         grob
## 1   0 ( 1-12, 1- 9) background      zeroGrob[plot.background..zeroGrob.323]
## 2   5 ( 6- 6, 4- 4)     spacer                               zeroGrob[NULL]
## 3   7 ( 7- 7, 4- 4)     axis-l          absoluteGrob[GRID.absoluteGrob.316]
## 4   3 ( 8- 8, 4- 4)     spacer                               zeroGrob[NULL]
## 5   6 ( 6- 6, 5- 5)     axis-t                               zeroGrob[NULL]
## 6   1 ( 7- 7, 5- 5)      panel                     gTree[panel-1.gTree.314]
## 7   9 ( 8- 8, 5- 5)     axis-b          absoluteGrob[GRID.absoluteGrob.315]
## 8   4 ( 6- 6, 6- 6)     spacer                               zeroGrob[NULL]
## 9   8 ( 7- 7, 6- 6)     axis-r                               zeroGrob[NULL]
## 10  2 ( 8- 8, 6- 6)     spacer                               zeroGrob[NULL]
## 11 10 ( 5- 5, 5- 5)     xlab-t                               zeroGrob[NULL]
## 12 11 ( 9- 9, 5- 5)     xlab-b  zeroGrob[axis.title.x.bottom..zeroGrob.317]
## 13 12 ( 7- 7, 3- 3)     ylab-l    zeroGrob[axis.title.y.left..zeroGrob.318]
## 14 13 ( 7- 7, 7- 7)     ylab-r                               zeroGrob[NULL]
## 15 14 ( 4- 4, 5- 5)   subtitle        zeroGrob[plot.subtitle..zeroGrob.320]
## 16 15 ( 3- 3, 5- 5)      title           zeroGrob[plot.title..zeroGrob.319]
## 17 16 (10-10, 5- 5)    caption         zeroGrob[plot.caption..zeroGrob.322]
## 18 17 ( 2- 2, 2- 2)        tag             zeroGrob[plot.tag..zeroGrob.321]
## 19  0 (13-24, 1- 9) background              rect[plot.background..rect.360]
## 20  5 (18-18, 4- 4)     spacer                               zeroGrob[NULL]
## 21  7 (19-19, 4- 4)     axis-l                               zeroGrob[NULL]
## 22  3 (20-20, 4- 4)     spacer                               zeroGrob[NULL]
## 23  6 (18-18, 5- 5)     axis-t                               zeroGrob[NULL]
## 24  1 (19-19, 5- 5)      panel                     gTree[panel-1.gTree.347]
## 25  9 (20-20, 5- 5)     axis-b          absoluteGrob[GRID.absoluteGrob.348]
## 26  4 (18-18, 6- 6)     spacer                               zeroGrob[NULL]
## 27  8 (19-19, 6- 6)     axis-r          absoluteGrob[GRID.absoluteGrob.351]
## 28  2 (20-20, 6- 6)     spacer                               zeroGrob[NULL]
## 29 10 (17-17, 5- 5)     xlab-t                               zeroGrob[NULL]
## 30 11 (21-21, 5- 5)     xlab-b                               zeroGrob[NULL]
## 31 12 (19-19, 3- 3)     ylab-l                               zeroGrob[NULL]
## 32 13 (19-19, 7- 7)     ylab-r titleGrob[axis.title.y.right..titleGrob.354]
## 33 14 (16-16, 5- 5)   subtitle        zeroGrob[plot.subtitle..zeroGrob.356]
## 34 15 (15-15, 5- 5)      title           zeroGrob[plot.title..zeroGrob.355]
## 35 16 (22-22, 5- 5)    caption         zeroGrob[plot.caption..zeroGrob.358]
## 36 17 (14-14, 2- 2)        tag             zeroGrob[plot.tag..zeroGrob.357]
# Adjust panels
panels <- both.1812.plot.simple$layout$t[grep("panel", both.1812.plot.simple$layout$name)]

# Because this plot doesn't use coord_equal, 
# since it's not a map, we can use whatever relative numbers we want, like a 3:1 ratio
both.1812.plot.simple$heights[panels] <- unit(c(3, 1), "null")

grid::grid.newpage()
grid::grid.draw(both.1812.plot.simple)

Side by Side

More Accessible Resources

R Basics

Installation R can be downloaded from one of the mirror sites in http://cran.r-project.org/mirrors.html. You should pick your nearest location.

Using External Data R offers plenty of options for loading external data, including Excel, Minitab and SPSS files. We have included a tutorial titled Data Import on the subject for the purpose.

R Session After R is started, there is a console awaiting for input. At the prompt (>), you can enter numbers and perform calculations.

1 + 2 [1] 3 Variable Assignment We assign values to variables with the assignment operator “=.” Just typing the variable by itself at the prompt will print out the value. We should note that another form of assignment operator “<-” is also in use.

x = 1 x [1] 1 Functions R functions are invoked by its name, then followed by the parenthesis, and zero or more arguments. The following apply the function c to combine three numeric values into a vector.

c(1, 2, 3) [1] 1 2 3 Comments All text after the pound sign “#” within the same line is considered a comment.

1 + 1 # this is a comment [1] 2 Extension Package Sometimes we need additional functionality beyond those offered by the core R library. In order to install an extension package, you should invoke the install.packages function at the prompt and follow the instruction.

install.packages() Getting Help R provides extensive documentation. For example, entering ?c or help(c) at the prompt gives documentation of the function c in R. Please give it a try.

help(c) If you are not sure about the name of the function you are looking for, you can perform a fuzzy search with the apropos function.

apropos(“nova”) [1] “anova” “anova.glm” ….

References

John W. Tukey. 1977. Exploratory Data Analysis.
Minard, Charles Joseph. 1869. Carte figurative des pertes successives en hommes de l’Armée Française dans la campagne de Russie 1812-1813. Graphics Press.
Pearson, Karl. 1895. Contributions to the Mathematical Theory of Evolution. II. Skew Variation in Homogeneous Material.” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 186: 343–414. https://doi.org/10.1098/rsta.1895.0010.
Wickham, Hadley. 2010. A Layered grammar of graphics.” Journal of Computational and Graphical Statistics 19 (1): 3–28. https://doi.org/10.1198/jcgs.2009.07098.
Woodbridge, William C. 1845. School Atlas, To Accompany Modern School Geography. Hartford: Belknap; Hamersley.